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RISK HULL METHOD AND REGULARIZATION BY 
PROJECTIONS OF ILL-POSED INVERSE PROBLEMS 

By L. Cavalier and Yu. Golubev 

Universite de Provence (Aix- Marseille 1) 

We study a standard method of regularization by projections of 
the hnear inverse problem Y = Af + e, where e is a white Gaussian 
noise, and A is a known compact operator with singular values con- 
verging to zero with polynomial decay. The unknown function / is 
recovered by a projection method using the singular value decompo- 
sition of A. The bandwidth choice of this projection regularization is 
governed by a data-driven procedure which is based on the principle 
of risk hull minimization. We provide nonasymptotic upper bounds 
for the mean square risk of this method and we show, in particu- 
lar, that in numerical simulations this approach may substantially 
improve the classical method of unbiased risk estimation. 

1. Introduction and main result. The inverse problem paradigm is re- 
lated to the classical linear algebra problem in which we want to find a 
solution X € M*^ of the linear equation 



where A is a known d x d matrix and y is a given vector in W^. Prom 
a mathematical viewpoint, the linear inverse problem can be considered 
a straightforward generalization of (1.1). Let H, G be two Hilbert spaces 
and let A be a continuous linear operator H ^ G. Suppose we have at our 
disposal an element (a function) defined by 



where e is an unknown function which is small. The goal is to recover / G H. 

Numerous applications of inverse problems in medical image processing, 
econometrics and astrophysics make this area very attractive for mathemat- 
ical incursions. The mathematical literature on inverse problems is so vast 
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that it would be impractical to cite it here. We refer the interested readers to 
[3, 10, 14], where interesting applications of inverse problems can be found. 

In the last two decades, the stochastic approach, which goes back to [18], 
has been very intensively studied in the statistical literature (see, e.g., [5, 7, 
8, 9, 11, 16, 17, 19, 21]). In this approach, it is usually assumed that e is a 
Gaussian white noise in EI (see, for details, [15]). 

The simplest way to understand why the problem (1.2) may be difficult 
is to look at the singular value decomposition (SVD) of A. Let A* be the 
adjoint to A. Suppose A* A is a compact operator with eigenvalues > 

0,fc = l,..., and eigenfunctions ifk,k=l, Let ipk = Aipk/\\Aipk\\- Then 

we get the following equivalent representation of (1.2): 

(1.3) yk = Ok + crk^k, A; = 1,2,..., 

where Ck are i.i.d. J\f{0, 1) , yk = {Y, ipk)/ V>^, Ok = (/, (pk),(Tk = e/ V>^, and 
e is a known spectral density of Gaussian white noise e. 

Ill-posed inverse problems are characterized by the fundamental property 
that (Tfc — > oo as A: — > oo, and the behavior of cr^ for large k describes the 
difficulty of the inverse problem. In this paper we will deal with moderately 
ill-posed inverse problems with polynomially increasing {ai^ x /c^,/3 > 0). 
Recall that in the statistical literature this type of inverse problem is of- 
ten associated with estimation of the derivative of order /? of a regression 
function. 

The fact that ^ oo immediately entails that the natural inversion 

(1.4) A-'Y= \l^'\Y,i:kWk 

k : Afc>0 

cannot be used since the quadratic risk of this method is infinite. A stan- 
dard way to overcome this difficulty is based on a regularization technique. 
Nowadays the family of regularization methods available for practical appli- 
cations is very large; see [10] and [23]. In the present paper, we will focus 
on regularization by projections. The idea of this method is very simple. In 
order to invert A, let us use the first terms of the expansion (1.4). In 
other words, to recover / or equivalently 6k,k = 1, . . . , in the model (1.3), 
we use the projection method 

(1.5) ek{N)=ykl{k<N). 

The mean square risk of this inversion method is computed very easily: 

oo TV 

(1.6) R{e,N)=EemN)-ef= J2 ^I + E^^- 

k=N+l k=l 

The parameter here is called the bandwidth and the major statistical 
problem is related to the data-driven choice of A^. Roughly speaking, the 
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goal of this choice is to minimize the right-hand side of (1.6) based on the 
noisy data yt from (1.3). 

A classical approach to this minimization problem is based on the princi- 
ple of unbiased risk estimation (URE) (see [22]). The idea to use this method 
for adaptive bandwidth choice goes back to [1] and [20]. Originally, URE was 
proposed in the context of regression estimation = £■ Nowadays, it is used 
as a basic adaptation tool for many statistical models. For inverse problems, 
this method was studied in [5] , where precise oracle inequalities for the mean 
square risk were obtained. 

The heuristic motivation of URE is rather simple. The underlying op- 
timization problem can be reformulated as minimization of — Y^k=i ~^ 
J2k=i'^k [^^^ (1-6)]. Noticing that the unobservable term J2k=i^k ^^^^ 
estimated by J2k=iiyk ~ '^k)^ choose the bandwidth as 



iVurc(y) 



arg min i?(y , N) where i?(y , TV) = <^ - V + 2 V I . 

I k=i k=i J 

(1-7) 

Intuitively, since A^urc(y) minimizes the estimator of the risk, it means that 
the risk of the method E0||^(iVurc) — ^||^ can be controlled by the risk of 
the best projection method infAri?(^, A^), which is sometimes called risk of 
oracle. Following [4], we measure the quality of the method 9{Nurc) by the 
ratio of its risk to the risk of oracle, 

(1.8) ^^^^_E.||«(iV„„)-«IP 



mfNR{e,N) ' 

When we use URE we hope that r(9) is bounded from above by a relatively 
small constant uniformly over all 6. It is well known that this native hy- 
pothesis holds (see [4]) for direct estimation (ufc = e). However, when we 
deal with an inverse problem the situation becomes more difficult. 

In order to illustrate the difference between direct and inverse estimation, 
we will carry out a very simple numerical experiment. Obviously, we cannot 
compute in a numerical experiment r{6) for all 9 £l2- Therefore, let us take 
6k = and compute r(0) for two cases, ak = £ and ak = sk. The first case 
corresponds to classical regression function estimation (direct estimation), 
whereas the second is related to the estimation of the first-order derivative 
of a regression function. Notice that in both cases the risk of the oracle 
is evidently mf]\f R{0, N) = since argmin^v -R(0, A^) = 1. In order to shed 
some light on the performance of URE, we generated 2000 independent 
random vectors , j = 1, . . . , 2000, with the components defined by (1.3). For 
each vector we computed N^reil/-^) and the normalized error ||0[Aure(y-')] — 
9\\'^/e'^ and plotted these values as a stem diagram. We also computed the 
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Fig. 1. The method of unbiased risk estimation. 
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Let us discuss briefly the numerical results of this experiment shown in 
Figure 1. The first display (direct estimation) shows that the URE method 
works reasonably well. Almost all bandwidths N^^reiy^') are relatively small 
(their mean is 1.98) and r(0) = 3.72. Even a quick look at the second display 
shows that the distribution of N^T-c{y^) changed essentially. Now the mean 
is 5.95 and there are sufficiently many bandwidths N^^Tciy^^) greater than 20. 
This results in a catastrophic r(0) ~ 2000. On the other hand, it follows from 
the oracle inequalities (see [5, 6, 7] or Theorem 4 of the present paper) that 
in both cases there exist a lot of 9 for which r(6) ~ 1. Comparing this fact 
with the simulations, we can conclude that for ill-posed inverse problems, 
URE does not work properly since very large r(0) undermines its basic idea. 
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There exists a more general approach which is very close to URE. This 
method is called method of penalized empirical risk, and in the context of 
our problem it provides us with the bandwidth choice 

N{y) = argmin^pcn(y,iV), 
N>1 

(1.9) 

f N N ^ 

I k=i k=i ) 

where pen(A^) is a penalty function. The modern literature on this method 
is vast and we refer the interested reader to [2] or [4]. The main idea at 
the heart of this approach is that severe penalties permit one to improve 
substantially the performance of URE. For instance, it is known that this 
approach works well for severely ill-posed problems, where URE completely 
fails (see, e.g., [12]). However, it should be mentioned that the principal 
difficulty of this method is related to the choice of the penalty function 
pen(A^). 

In this paper we propose a more general approach, called risk hull min- 
imization (RHM), which gives a relatively good strategy for the penalty 
choice. Our goal is to present heuristic and mathematical justifications of 
this method. In the framework of the empirical risk minimization RHM can 
be defined as follows. Let the penalty in (1.9) be 

N 

(1.10) pen(iV) = pen,h„,(iV) = ^ + (1 + a)U^{N), 

k=l 

where 

N 

Uo{N) = mf{t > : B7]nI{vn >t)< af} with W = E ''ii^f " !)• 

i=l 

(1.11) 

RHM chooses the bandwidth A^rhm(y) = N{y) according to (1.9) with the 
penalty function defined by (1.10), (1.11). The following theorem provides 
an upper bound for the mean square risk of this approach. Recall that it 
assumed that ak has polynomial growth {ak xe/c^); see, for details, (2.5) 
and (2.6). 

Theorem 1. There exist constants C* > and 70 > such that for all 
7 S (0, 7o] and a> 1 

(1.12) E||^~(iVrhm) - Of < (1 + 7) inf i?rhm(^, N) + C,aj (^-^ + , 
where R,^^{e, N) = ET=N+i + ELi 4 + (1 + c,)Uo{N). 
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The statistical sense of Theorem 1 is rather transparent. The principal 
term of this upper bound is inf ^ Ra{6, N). The residual term 

min|7inf i?rhm(e,iV) + ^} + ^ 

= C*aj[{4(3 + 1)1/(4/3+2) ^ ^ ^^_(4/3+l)/(4/3+2)] 
"inf^i?,hm(^,A^)" 



1 (4/3+l)/(4/3+2) (-.*^2 

a — 1 



defines how much we should pay for stochastic minimization. Using this the- 
orem we can get a typical panorama of minimax facts related to moderately 
ill-posed problems (see [5]). Moreover, simulations in Section 3 reveal that 
the constant C* is really small. It means, in particular, that in contrast to 
URE this method is stable. 

The present paper is organized as follows. In Section 2, a heuristic moti- 
vation and additional facts related to RHM are presented. Section 3 contains 
simulation results. The proofs and technical lemmas are postponed to Sec- 
tion 4. 



2. The RHM method. 



2.1. A heuristic motivation. The heuristic motivation of the RHM ap- 
proach is based on the oracle ideology. Suppose there is an oracle which 
provides us with 6k,k = 1, . . . , but we are allowed to use only projection 
methods. In this case the optimal bandwidth is evidently given by 

A'or = argminr(y,iV) where r{y,N) = \\6{N) - d\f. 
N 

Let us try to mimic this bandwidth choice. At the first glance this problem 
seems hopeless since in the decomposition 

oo N 

r{y,N)= (^l + T.'^l^l 

k=N+l k=l 

neither 0| nor is really known. However, suppose for a moment that 
we know all the O^, and we try to minimize r{y,N). Since are assumed 
to be unknown, we can use a conservative minimization. It means that we 
minimize the nonrandom functional 



(2.1) 



oo 

i{e,N)= J2 ^i + n^v)' 

k=N+l 
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(2.2) Esup 

N 



where V{N) bounds from above the stochastic term J2k=i'^k^k- seems 
natural to choose this function such that 

N 

Y^aUl-V{N) <0, 

k=l 

since then we can easily ^ontrol the risk of any projection estimator with a 
data-driven bandwidth A^, 

(2.3) Ee\\e{N)-0f<Eelie,N). 

This motivation leads to the following definition: a nonrandom function 
1{0,N) such that Bg sup j^[r [y, N) -£{e,N)] <0 is called a risk hull. 

Thus, we can say that 1{6,N) defined by (2.1) and (2.2) is a risk hull. 
Evidently, we want to have the upper bound (2.3) as small as possible. So, 
we are looking for the minimal hull. Note that this hull strongly depends on 
o"! and we present in the sequel a numerical recipe to compute it. 

Once V{N) satisfying (2.2) has been chosen, the minimization of 1{9,N) 
can be completed in the standard way by using unbiased estimation. Note 
that our problem is reduced to minimization of — J2k=i ^k + ^(-^)- Replac- 
ing the unknown 9"^ by their unbiased estimates y| — cr|., we arrive at the 
following method of adaptive bandwidth choice: 

TV N 
L fc=l fc=l 

A cornerstone idea of this approach is that we can find a function V{N) 
such that the data-driven N minimizes the risk hull 1{9,N) without signifi- 
cant losses, that is, 

^eKG, N) < imnl{9, N) + smalLterm. 

Therefore, combining this with (2.3), we get the inequality 

(2.4) Eg\\9{N) - 9f < mml{9, N) + smalLterm, 

which represents a heuristic version of an oracle inequality for the RHM 
method. 

Notice that when the risk is measured by the l2-norm, RHM coincides with 
the empirical risk minimization approach which is usually used in model 
selection [4]. The major issue of model selection is the choice of a good 
penalization. In the framework of the RHM approach, this problem can be 
rephrased as follows: to find the minimal risk hull, which can be minimized 
based on the data. We do not believe that there is a good general formula for 
the optimal risk hull or for the penalty. What we can really do is to make 
use of the Monte Carlo method to compute an approximation of this hull. 
The goal of the present paper is to demonstrate that this approach works 
well for the regularization by projections. 



N = arg min 

N 
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2.2. Statistical model and assumptions. In the sequence space model 
(1.3), we supposed that (t| is a polynomially increasing sequence with af > 0. 
To be more precise, it is assumed that this sequence satisfies the following 
hypothesis. 

Polynomial hypothesis. There exist constants Ci,C2,C3 such that 
for some /3 > and for all A; > 1 

/ , 2fe \ 1/2 / \ 2/3/(2/3+1) 



For any integer s > 1 

. k / 1 \ {2s/3+i)/(2/3+i) 

^1 i=i V^i i=i / 

Let us comment very briefly on these assumptions. Assumption (2.5) 

means that cr| can have only polynomial growth. Indeed, since x^^^"^^^^' 
is a concave function, we have by (2.5) 

1/(2/3+1) / , k-1 \ 1/(2/3+1) 




'1 i=i I V'^i i=\ 

and summing up these formulas, one can easily check that 

2/^ 2/C^2(A;-l)\2^ , ^ 2 



(2.7) 



Thus cjfc can have only polynomial growth of order /?, which we will call the 
degree of inverse problem. 

2.3. A risk hull. The main ingredient of RHM is the function UQ{k),k = 
1, . . . , defined by (1.11). The simplest way to compute it is to make use of 
the Monte Carlo method. It should be mentioned that this method is time 
consuming since this function is related to large deviations of Lemma 
1 below gives an asymptotic approximation for UQ{k), but we will see that 
this approximation is not good for small k. Therefore we prefer to use the 
nonasymptotic formula (1-11) in our approach. It should be mentioned that 
the performance of RHM is sufficiently stable with respect to small pertur- 
bations of Uo{k). Denote for brevity 

E^ = f:al and uo{N) = ^fi. 
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Lemma 1. There exists an integer Nq > 1 such that for all N > Nq 



(2.8) no(iV) > ni(iV) y^log(E^/(27raf )). 

This fact plays a principal role in the proof of the following theorem. 
Theorem 2. There exists a constant C.^ such that for any a > 

(2.9) Un(^,iV)= E el + Y,al + {l + a)Uo{N) + ^^ 

— — a 

k=N+l k=l 
is a risk hull, that is, Esup^[r(y, A^) — Zrhm(^i-^)] ^ 0. 



This theorem says that uniformly in A^, the loss r{y,N) can be bounded 
by the risk hull /rhm(^)-^)- Thus, for any N data-dependent, we can bound 
the risk of the projection regularization method by the expectation of the 
risk hull [see (2.3)]. 

We have mentioned that the URE and RHM methods can be viewed as 
minimizers of the penalized empirical risk [see (1.9)]. While the penalty cor- 
responding to the URE is given by pen^j.g (N) = EfcLi crh the RHM method 
has the larger penalty 

penrhi„(A^) = Ef=iO-fc + (l + ")f^o(A^)- Thus, it would 
be instructive to look at the ratio pen^^^{N) / peny^g(iV). If we suppose for 
a moment that the distribution i]k can be approximated by a Gaussian law, 
then we get from (1.11) 



Uo{N) « Uo{N) = \/2S^log[S^/(7rc74)]. 
Under the polynomial hypothesis [see (2.5), (2.6)] it is easy to check that 



C/o(iV) = o(^Eai^, Uo{N) = o(^Y.^lj^ 



Nevertheless it is instructive to look at what is going on when is small. 
Therefore we plotted in Figure 2 the functions 

..n_ Pe^rhm(A^) _-, , (l + a)^o(A^) m - i ^ ii±^lM^ 

penm,gi^jvj 2^k=l'^k 2^k=l^k 

with a = 0.1. Since we used the Monte Carlo method, the function p{N) 
looks a little bit wiggly. The first display (direct estimation) shows that 
(1 + a)Uo{N) is smaller than J2k=i'^k this function cannot substan- 
tially affect the performance of URE. On the other hand, the second plot 
distinctly demonstrates that {l + a)Uo{N) dominates J2k=i ^k when ak = ek. 
It means that in this case RHM and URE may work quite differently. Note 
also that in the case of inverse estimation the difference between Uo{N) 




1 ^ ' ' ' ' 1 

10 20 30 40 50 

Fig. 2. The functions p{N) (solid line) 
inverse (ak=ek) estimation. 



1 LI ' ' ' ' 1 

10 20 30 40 50 

and p(N) (dashed line) for direct (ok =£) and 



and its Gaussian approximation Uq{N) may be significant for small N . Cer- 
tainly, Uq{N) /Uq{N) ^ 1 as ^ cxo, but very often numerical performance 
of RHM strongly depends on the behavior of the penalty function for small 
N , and this is why we used Uq{N) in our method. 



2.4. The risk hull approach and URE. Let us finish this section with 
a discussion of the URE method, which can be also viewed as a risk hull 
method. The following theorem justifies this idea. 



Theorem 3. There exists a constant Cu such that for any a > 



,iV) = (! + «) 



N 



Lfc=7V+l fc=l 



Oi 



Cu 2 



is a risk hull. 
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It is clear that the data-driven bandwidth choice N^rc defined by (1.7) 
can be viewed as the minimization of the risk hull lnrciO,N). The following 
theorem provides an upper bound for the risk of this method. 

Theorem 4. There exist constants C* > and 70 > such that for all 
7 e (0, 7o] 

(2.10) BeWm^re) - ef < (1 + 7) inf i2(^, N) + 

This result rectifies Theorem 1 in [5]. It shows in particular that there 
is no logarithmic factor in the corresponding oracle inequality. At the first 
glance it seems that URE method may work better than RHM. This naive 
idea is motivated by the fact that 

miR(e,N) <miR,i,,^{e,N). 

Recall that the left-hand side of this display represents the main term of the 
upper bound (2.10) while the right-hand side is the principal term of (1.12). 
But the real situation is not so trivial. In order to compare the bounds (2.10) 
and (1.12), we should take into account the remainder terms defined by 
constants C* and C*. Both these constants depend on (3 but their statistical 
nature and behavior are quite different, which follows from inspection of the 
proofs of Theorems 1 and 4. The constant C* may be very large even for 
P > 1 whereas C* remains moderate. We shall clearly see this phenomenon 
in the following section devoted to numerical simulations, but now let us 
discuss at the heuristic level the principal difficulties of URE. The basic idea 
of this method is that R{y, N) = — J2k=i 2/^ + 2 J2k=i ^"k ^ good estimator 
for ¥igR{y,N) = — Y^k=i^'k + X]fcLi<^fc- order to see that this idea may 
fail, it suffices to look at the variance 

N 

E,[i?(y, N) - BeRiy, N)f > 2 ^ at 

k=l 

So, R{y, N) might be considered a good estimator, if 

/ N \l/2 

BeR{y,N)>2i^2Y,atj . 
This entails, in particular, that the following inequality should hold: 

N / TV \ 1/2 

(2.11) ;^ai>2 2;^at , 

k=l \ k=l ) 

for all > 1. Notice that the factor 2 in the above inequality is, in some 
sense, very optimistic. In fact, it should be replaced by a function which 
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tends to infinity as — > oo. However, let us suppose that cjfc = ek^ and look 
for integers N13 for which (2.11) starts to work. For /3 = 0, we get A'^o = 8, 
for /3 = 1, A'^i = 14 and so on. It is easy to see that URE will always choose 
a bandwidth of order at least Njj. This evidently results in the risk order 
e^A^^^"*"^. We would like to draw attention to the fact that this lower bound 
does not depend on the risk of the oracle infjv A^). The latter may be 
small while e'^n'^^'^^ is large. Thus, roughly speaking, URE works well when 

mfR{e,N)>e^Nf+\ 

Otherwise it fails. Unfortunately, the factor N'^^^^ is large even for moderate 
/?; for /? = 1 it is of order 10^. 

The second almost evident fact is that the bandwidth A^ of the best 
projection method is typically small when we deal with ill-posed problems. 
For instance, consider the minimax recovering of vectors 9 from the Sobolev 
ball 



Wm{L) = }^e:J2^elk''"^<Lj. 

Then it is easy to see (see, for details, [5]) that A^ is of order e-2/(2m+2/3+i) ^ 
Thus, when /3 = 1 and m = 1, this term is of order Therefore even for 

a very small noise level = 10"^, A^ will not be larger than 20. Combin- 
ing this with the previous remark, we see that in this case URE may not 
work properly. From an asymptotic viewpoint everything goes smoothly, but 
unfortunately asymptotic arguments start to work for very small £. 

3. Simulations. In this section we present some numerical properties of 
the RHM approach. Numerical testing of nonparametric statistical methods 
is a very difficult and delicate problem. The goal of this section is very 
modest. We would like to illustrate graphically Theorems 1 and 4. To do 
that, we propose to measure statistical performance of a method A^ by oracle 
efficiency defined by 

~ ■m(NEemN)-e\\^ 



Eg\\e{N)-e\ 



It should be mentioned that we use the inverse of the ratio r{6) from (1.8) 
since we want to get a good graphical representation of the performance. 
We have seen in the Introduction that r{6) may vary from 1 to 2000 for the 
URE method. This results in a degenerate plot of r{9). Therefore, in order 
to avoid this effect, we use eoriO,N) instead of r{9). 
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Fig. 3. Oracle efficiency of URE and of RHM for direct estimation (ok — £)■ 

Since it is evidently impossible to compute the oracle efficiency for all 
G I2, we choose a sufficiently representative family of vectors 9. In what 
follows we will use the linear family 



where a defines amplitude, W bandwidth and m smoothness. 

We shall vary a in a large range and plot roj-{9"',N) as a function of a 
which is directly related to the signal-to-noise ratio in the considered model. 
The parameters m = 6 and W = 6 are fixed. In other examples of (VF,m) 
the authors looked at, simulations showed that the oracle efficiency exhibits 
similar behavior. 

Two methods of data-driven bandwidth choice will be compared: URE 
and RHM with a = 1.1. It is easy to see that for these methods ror(^°',-^) 




Amplitude Ami^rtude 



Fig. 4. Oracle efficiency of URE and of RHM for first-order derivative estimation 
(ok = ek). 
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Ampli!ud« Arnf)fltiJd» 

Fig. 5. Oracle efficiency of URE and of RHM for second-order derivative estimation 
(ok=ee). 



does not depend on e. This function was computed by the Monte Carlo 
method with 40,000 rephcations. We start with direct estimation where at = 
e. Figure 3 shows the oracle efficiency of URE (left panel) and the oracle 
efficiency of RHM (right panel). Comparing these plots, one can say that 
both methods work reasonably well. However, if we deal with an inverse 
problem such as derivative estimation, we can see a significant difference 
between these methods. The corresponding oracle efficiencies are plotted on 
the left and right panels of Figure 4. For small values of a the performance 
of URE is very poor, whereas RHM demonstrates very stable behavior. For 
very large a = 500 the oracle efficiency of URE is of order 0.16, while RHM 
always has efficiency greater than 0.4. Figure 5 deals with the case when 
the inverse problem becomes really ill-posed {pk = sk'^). In this situation 
URE fails completely. Its maximal oracle efficiency is of order 3 * 10^'^. 
Nevertheless, RHM has a good efficiency (greater than 0.3). In the context 
of Theorem 1 and Theorem 4 this example shows that the constants C* and 
C* are quite different: while C* is small, C* is really large. Unfortunately, 
it means that the terms which are asymptotically small in Theorem 4 may 
easily dominate the oracle risk. 

Let us finish this section with a short discussion of the role played by a. In 
the previous numerical simulations this parameter was 1.1. What happens 
if we set this parameter to 0? The answer depends on [3. If [3 is small, /? < 1, 
everything goes smoothly. However, even for (3 = 2 this choice results in an 
instable procedure. On the other hand, taking a to be large leads to poor 
performance of RHM. 



4. Proofs. 
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4.1. Proof of Theorem 2. 

Proof of Lemma 1. Denote for brevity 



and 



$7v(t) = Eexp(itK7v)- 



We begin with an upper bound for the absolute value of ^'Ar(t). Recalling 
the definition of rj^ and using (2.5), we have 



\^N{t)\ <exp 
< exp 



1 



N 



1=1 



-'N 



-^log 1 + 



Sat 



With this inequality, we have that for all x < y/N/C 



\t\>x 



\^N{t)\dt < 



(4.1) 



< 



x<\t\<,/2N/C 

exp[-Ct^]dt + 



\t\>x 



, \'^Nit)\dt 

t\>\/2N/C 



t\>y/2Njc\ N 



< exp[-Cx2] + W_2-^/8 < exp[-Cx2]. 
8C 

Let us fix an integer M. Then by the Taylor formula we get that for all 
\t\ < ^NjC 

$iv(t) = exp 



s=3 



M 



where R, = {Y.^Y'l'^Y.t^af . 

It follows easily from (2.5) that o% < Ca%i^. This gives jii^l x Ar-«/2+i. 
Therefore, expanding $7v(i) exp(i2/2) into Taylor series, it is easy to see 
that there exist functions Qm{s, N),s = 3, . . . , M, uniformly bounded in 
and s such that 



A/-1 



(4.2) ^>^(0exp(tV2) = l + iV Qm{s,N) 

s=3 



it 



+ N 
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Define now the following approximation of ^^{t): 
<(t) = exp(-tV2) 



A/-1 
s=3 



it 



Now we can approximate the probability P(KAr > x) by 

(4.3) P^^(x) = / pfj{v)dv withp^{v) = — exp{-itv)^fj{t)dt. 



Notice that 
(4.4) Pifix) 
where 



— T.i-^rQMis,N)k-^/'^ 



s=3 



dx^ 



j-exp(-x /2), 



1 f°° 

<t>{x) = J exp(-uV2) du. 

Then by the Parseval identity and (4.1), (4.2), we obtain 
\P{kn>x)-P^\x)\ 

<-j_jtn'^^it)-'^N{t)\dt 



(4.5) 



< 



t\<y/Njc 

C 



dt + 



\t\>y/Njc 



dt 



+ exp[-CAf] < 



C 



Using (4.3) and (4.5), it is easy to see that 



(4.6) 



P{KN>x)>P^^ix 



C 



Now we are ready to complete the proof of the lemma. Since the function 
F{x) = EK;Ar/(K7v > x) is a monotone nondecreasing function in a; > 0, we 
need to check that for sufficiently large [see (1.11) and (2.8)] 



EknI{kn>ui{N)) > 



y2S 



N 



It follows from the above equation and integration by parts that it suffices 
to show that 



(4.7) ui{N)P{KN>ui{N)) + 



ui{N)+l ^2 

P{kn >x)dx> — 
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Using (4.6), we bound the left-hand side as 

pui{N)+l 

ui{N)P{kn>ui{N))+ P{KN>x)dx 

Jui{N) 

(4.8) 

^ ' ;>oo j'OO 

= ui{N)(j)[ui{N)] + (p{x)dx- (l){x)dx 

Jui{N) Jui{N)+l 

-(l + ni(iV)) ^ max \P^^ (^x) - <P{x)\ - 

ui(N)<x<ui{N)+l TV"-''^ 

Integrating by parts, we get 

poo 1 

(4.9) u^{N)<t>[u,m + / </>(x)dx = -=e-"?(^)/2 = ^=^. 

Jui{N) V2tt V^^iV 

Noticing that in view of (2.7) 



(4.10) y^log(Af/(2^)) < ui{N) < C^log{N) 

and integrating by parts, we have as — > oo 

^(x)dx<Ce-("^(^)+i)^/2<^^' 

lui{N)+l 

and by (4.4) and (4.10), 



(4.11) r 0(x)dx<Ce-("^W+i)'/2<^ie-"^(^) = o('^iy 

Jui{N)+l V2SAr Vv2i;Ar/ 



(l + ni(iV)) ^ max \P^'{x)-m\ 
ui{N)<x<ui{N)+l 

(4.12) 

- .2^3/ 



< 



Caiui{N) ^J aj \ 



Finally, note that we can choose sufficiently large M such that [see (4.10) and (2.7)] 

Combining this equation with (4.8)-(4.12) we arrive at (4.7), thus finishing 
the proof of the lemma. □ 



Lemma 2. For some C > 

( ^ 



(4.13) P{??7v>a;}<exp(-^), 0<x<% 
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Proof. Certainly, this fact is well known and we prove it only for the 
reader's convenience. We use the inequality log(?/) > ?/ — 1 — (1 — l/?/)^/2, y G 
(0, 1], which can be checked easily since the first derivative in y of log(y) — 
y + 1 + (1 — 1/yY /2 is negative. Therefore, for any positive A 

{ N ^ N >! 

Eexp(A??7v) = expj-A^af - -^log(l - 2Af7,f)+| 

( N 4 ^ 

(4.14) <exp A^EtI 



2\2 



1 = 1 1=1 ^ 

Then, by the Markov inequality, we have 

^{Vn > x] < exp(— Ax)Eexp(A77Ar) 

{N 4 ~| 

In order to prove (4.13), we take A = x/(8S7v)- D 

Define the auxiliary function 

U{a) = -l- —\og(l-2a), a G (0,1/2). 
2a 

By the Taylor formula U{a) = 2a^^2(2Q^)*~^/^- This yields immediately 
that a < U{a) < a/(l — 2a) and 

(4.15) <[/-!(„)< a, a>0, 
1 + 2a 

where U~^{a) denotes the inverse function. 

Lemma 3. Let Sn = T.f=i bfiCi - 1) - U{a) bf where are i.i.d. 
A/'(0, 1) and bf<l. Then for any a G (0, 1/2) 

(4.16) Esup57v<a~^ 

Ar>l 

and 

(4.17) P( sup Sat > X I < exp(— aj;). 
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The proof follows from the Doob inequality (see, e.g., [13]). 

Proof of Theorem 2. Define the exponential grid 
(4.18) ns=l{l+pV^y\, 

where p is a sufficiently small constant which will be chosen later on. By 
Lemma 1 and a simple algebra we have 

Esup{??7v-(l + a)f/o(iV)} 

TV 



(4.19) 



<J2B rnax [r,^ - + a)UoiN)l 



<5^E rnax [ry^v - (1 + a) V2Sfc MCS^^af)]. 



< EK - (1 + «) V25^"^ log(C7S„7at) + 



s=l 



where 



= max 

ns<N< 



E ^?(ef-i)-(i + «) 

\l=na-\-\ 



21og(CS„7(Tf 



2\/s 



■[Sat - S„J 



Denote for brevity 



21og(CS„ /af) y 

r, = (l + a) ^ ^ and A, = {\ ^ a)^\og{CY.^J a\). 

Then by (4.15) and (4.17) we obtain 
P{e. > x} 



P < max 



P < max 

ns<Af<ns+i 



TV TV 

i=ns+l 



j=n.+l 



> X 



TV ^2 



E i^(e.^-i)-r.<,, E 



TV 4 

0-7 



i=ns+l "-s+i i=ns+l ' 

< exp[-t/-i(r,a2 )x/ct2 1 < exp[-r,(l - 2r,a2 )x], 



> 



X 



0-^ 



2 



or equivalently 



(4.20) P{ y= > ^} < exp[-^(S„7S„^^ji/2(l - 2r,<^Jx]. 
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Notice that 1 — 2rs(T^^^^ > for sufficiently large s. Using (4.20) and inte- 
grating by parts, we get 



EK - (1 + a)V2Sn. log(CS„,K) + e,] + 



A, 



(4.21) 



As I V 2i;„^ V 2i;„^ 



<y^^"Eexp|-^J^^(l-2r, 



X Eexp|^(l - 2r,<^J^^=|. 

In order to bound from above the last term in this inequality, we have by 
(4.14) that for any positive A 



E 



exp{A7?„ J < expi g a\ + 4A3 g _ ''^^^2Vi 

\ i=l j=l ^ * ' + 



Using this inequality with A = ^^(1 — 2V sa\^^_^ ) / ■^2S„^_^^ and noticing that 

by virtue of the polynomial hypothesis A^ Y^L\ c'j^(l ~ 2A(T?)],]^ < C, we im- 
mediately get 

Eexp|A.-^^(l - 2r.<^j| < C^exp|^^(l - 2r.<^j|. 
Therefore, combining this with (4.21), we obtain 



E[r/„^ - (1 + a)pY.ns \og{CY.^Ja\) + + 
(4.22) < C^^^^^exp{-^(1 - 2r.<^J (2^ 



<C^^^exp{-|(l-2r.<^J 



Let us choose now the parameter p of the exponential grid. Note that by 
(2.5) 



< 



(l-p^)2- 
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Thus it is clear that we can always choose a sufficiently small p such that 

Hence from (4.22) we get 

EK - (1 + «)\/2Sn. log(CS„,M) + e,]^ 

< CA;i7s;;^exp[-(l + a - 2a2)(i _ 2r,<^ J logl^C^J^)] 

(4.23) 

< CafA;' exp[-a(l - 2a) log(y^CS„,/(74 )] 

In the above inequality we used the fact that > (1 + a)-\/log(ns) and that 
-'^s'^ns+i los(^^ns/^i) is uniformly bounded in s. Finally, substituting (4.23) 
in (4.19), we have 

4.24 Esup{??;v - 1 + a)Uo{N)} < -^Y. r ^ ^ 

thus proving the theorem. □ 



4.2. Proofs of Theorems 1, 3 and 4. We start with two technical lemmas. 
Their proofs can be found in [13]. 

Lemma 4. Let k > 1 he an integer random variable. Then for any N = 
1,2,... 

oo I" oo oo 1/2 

E aAi^ > - 3a2,E ^ 02 ^ 3Ea2 ^ . 

j=K I i=K i=N ) 

Lemma 5. For any Q G (1/2,(2/3 + l)/(4/3 + 1)] i/iere exist constants 
C{Q) > and a{Q) > suc/i that for all a G (0, a{Q)) the following inequal- 
ity holds: 

(4.25) E sup ( ^ - « f 1 < C(Q)a- V W-i) . 



Proof of Theorem 1. In view of Theorem 2, for any > 



oo N ^ 2 



i=N+l i=l 
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is a risk hull, and therefore 

(4.26) Bg\\9{N,h^) - 6f <Eel^,{9,N,h^). 

On the other hand, since iVrhm minimizes Rpcii{y,N) [see (1-9)], we have for 
any integer N 

(4.27) Eei?pen(y, iVrhm) < ^eRpcniv, N) = RAmiO, N) + \\9f. 

In order to combine the inequalities (4.26) and (4.27), we rewrite l^{0,Ny-h^) 
in terms of -Rpen(y, A'rhm), 



-Rpen(?/,iVrhm) + ||^|| + 



2 , Caf 



-^rhm -^rhm 

= iVrhm) - 2 E '^Mi - E ''fi^i - 1) + (a - /i)C/o(iVrhm). 

i=l 1=1 

Therefore, using this equation and (4.26), (4.27), we obtain that for any 
integer N 

Eell^(iVrhm) - Of < RAm{0,N) + ^ + 2Ee E '^i^i^i 

^ i=l 



(4.28) 



i=l 



Our next step is to control the last two terms in the above equation. By 
Lemma 4 we have that for any driven bandwidth N 



N 



(4.29) 



Eg E'^j^i^i = "Eg E '^iOiS.i 
«=1 i=N+l 



oo \ 1/2 
2 1 



<2|a7v| Ee E 



=iV+l 



/ oo \ 1/2 

2 E q 

\i=N+l I 



E,4. 



Noticing that by (2.5) ol < Cafia^^ Y.i=i ^2)2/3/(2/3+1) ^nd using the Young 
inequality, 

(4.30) xy'' <ry+{l-r)x^/^^-''\ re (0,1), 

with r = 1/2 and (4.29), we get that for any 7 > 

N 

E^E'^i^j^j 

i=l 
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<C\ai 



N \ /3/(2/3+l) / oo \ 1/2 

E 



=Af+l 



N \ /3/(2/3+l) 



oo oo 

<7 E ^' + 7Ee E ' 

i=N+l i=N+l 



=Af+l 



1/2 



+ 



2/(2/3+1) 



7 



(4/3+l)/(2/3+l) 



AT X 2/3/(2/3+1) 



7E^fc 

V i=l 



+ (^7EeE^i) 



TV \ 2/3/(2/3+1)- 



Once again using (4.30) with r = 2/3/(2/3 +1), we continue the above in- 
equahty as follows: 



TV 



i=l 



N 



N 



(4.31) <7 E E (^f + E-l + 



.i=N+l i=l 



>j=7V+l «=1 



N 



Caj 

-v4/3+l 



< ^R(Q, N) + 7E,||6i(iV) - 9f - ^EeJ2aM - 1) + 

i=i 

Therefore, substituting (4.31) in (4.28) and then using (4.24), we obtain 
{l-j)Be\\e{N,^^)-ef 

<(l+7)W^,iV) + ^ + ^ 



+ (1-7)E0 



E^'(^'-l)-f^f^0(iVrhn.) 

1 — 7 



<(l+7)^rhm(^,iV) + ^ + :^+ ^ ' 



(a-/i + 7- 1)+ 



Finally, choosing ij- = j, completes the proof. □ 



Proof of Theorem 3. This suffices to show that for any sufficiently 
small a > 

k 

Vk - "E^i 



Esup 

k 



1=1 



, Cu 2 



a 



4/3+1" 
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In view of (2.6) the proof follows immediately from Lemma 5 with Q = 
(2/3 + l)/(4/3 + l). □ 

Proof of Theorem 4. This follows the main lines of the proof of 
Theorem 1. By Theorem 3 we have 

(4.32) E,||^(iV^,e) - ef < ^elnrcie, iVurc) = (1 + a)E,ii(^, iV,re) + ^^^l'- 



Since A'ure minimizes — X^i^i Vi + '^Y^f=i we get for any integer N 

Nuvc Nnrc N N 

(4.33) - E + 2 E < -Ey' + 2E^'- 

i=l i=l i=l i=l 



Note also that 



2 \ I \ ^2 ll/3l|2 \ ^,2 I r) \ ^2 

i=l i=l i=l i=l 



^ urc -^ure 

+2E^.^.e. + E^'fe'-i)- 
1=1 1=1 

Therefore, combining this display and (4.33), we see that for any A'^ > 1 

-^urc -^urc 

(4.34) EeR{e, N^rc) < R{0, N) + 2Ee E f^i^i^i + E ^'fe' " 

1=1 i=l 

In order to control the interference term E^^i^r ^jCjCi) use (4.31) with 
iV = iVurc. This yields 



-i- V urc J urc rr^ 

E, E ^^^^^^ ^ «^(^' ^) + aEe||0~(7Vurc) - - aE, E " 1) + 
j=i i=i " 

Substituting this in (4.34), we have 

E,i?(0, iV,,e) < (1 + 2a)R{e, N) + 2aEe||^(iVrhm) - Of 

(4.35) 

+ (l-2a)E,E-.^(ef-l) + ^- 

i=l 

The last term in the above inequality can be controlled by Lemma 5, which 
gives that for any sufficiently small a > and Q G (1/2, (2/3 + l)/(4/3 + 1)], 



err / aV{2Q-i) • 
1=1 ^ / 
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Let Q = (2/3 + l)/(4/3 + l). Then by (2.6) 



2 

i 1 

i=l / i=l 



and thus we obtain 
(4.36) E, ''M - 1) < Q'Eg ^ af + . 



j = l 4 = 1 



Finally, combining the above equation with (4.35), (4.36) and (4.32), we 
complete the proof. □ 
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